Moneyball a book by Michael Lewis was a story about the Oakland Athletics a team that was limited to a small budget and was able to achieve success in Major League Baseball by using team analytics to choose the right players that could produce a successful regular season. Many NBA teams nowadays use team, player and game analytics to try to find an advantage in a competitive basketball league. As a general manager, scout, owner and even avid fan the quest to put together the perfect team that remains relatively close to the salary cap is the goal for the regular season. Ultimately choosing the right players to put together a championship dynasty is the goal for every city that has a NBA team.
What we are trying to solve and explore:
Therefore the question that we are trying to solve is, whether Moneyball will work in the NBA for a regular season like it did for the Oakland Athletics and can we identify trends and build upon these findings.
The goal is to make better informed decisions in salary contract talks as well as scouting in the free agency market. Unlike baseball the NBA has a soft cap which will allow teams to have exceptions and be able to spend money on players even though they are past the salary cap. A most recent notable signing is Demarcus Cousins for the Golden State Warriors. (He was signed using the mid-level exception for the 2018-2019 season.)
The data was collected from the following sources.
The hoops hype data contained the salary data for NBA players from 2001-2019. The 2000 salary data was collected from Particia benders website. Salary data collection was collected by using rvest to web scrap the table from the two websites.
salary_2018 <- read_html("https://hoopshype.com/salaries/players/2017-2018/") %>%
html_nodes("table") %>% html_table
Salary data was checked by looking a business insider article which took a look at some of the top paid players in the NBA.
College, age and weight data was collected through NBA stats and loaded through a csv file. NBA player statistics was collected through Basketball reference. The website provided csv files for player statistics.
Land of Basketball provided data on award and number of team wins.
The NBA collective bargaining agreement and sample contract of NBA players provided insights on the structure of NBA contracts.
The following packages were used to clean the data.
library(dplyr)
library(ggplot2)
library(tidyr)
library(stringr)
library(data.table)
library(ggrepel)
library(directlabels)
library(gridExtra)
options(max.print = 999999999)
options(scipen=12)
Initial Data cleaning
The data for the 2000 to 2019 data was cleaned year by year since there were many unique names for each season which required me to search for the name using grepl and changing it. The data was extremely raw and using different sources yielded different spellings of players and teams.
The first step was to remove unnecessary columns that weren’t going to be used. We sure to change the column name for the players names to names for before doing a full join.
# clean up stats
#remove rank column
stat <- stat[,-c(1,5)]
#Seperate player and player id
stat <- stat %>% separate(Player, c("player_name", "player_id"),"\\\\")
##Clean up to remove . from stat_2018
grep("\\.", stat$player_name, value = TRUE)
stat$player_name<- gsub("\\.", "", stat$player_name)
grep("\\.", stat$player_name, value = TRUE)
names(stat)
stat$Age <- as.numeric(stat$Age)
colnames(stat)[30] <- "ppg"
##Clean up salary
salary <- salary[,-c(3)]
#salary <- salary[-1,]
colnames(salary) <- c("player_name", "salary")
rownames(salary) <- NULL
grep("\\.",salary$player_name)
grep(".Jr$", salary$player_name, value = TRUE) # 3 Jrs
#remove columns from age data
age <- age[,-c(3,11:14,16:20)]
#remove rows that have Player
y<-which(age$PLAYER == "PLAYER")
age <- age[-c(y),]
#Change PLAYER to player_name
names(age)[1]<-paste("player_name")
Had to clean the data to remove all “-“, “*”, “,” from the name columns. Also, had to add in JR or III for data sources like Basketball reference that removes the suffix.
Used grepl and gsub to accomplish this.
#getting rid of periods in name
grep("\\*",age$player_name, value = T)
grep("\\*",stat$player_name, value = T)
stat$player_name <- gsub("\\*","",stat$player_name)
grep("Ray Allen",stat$player_name, value =T)
grep("\\*",salary$player_name, value = T)
grep("\\.",age$player_name)
age$player_name <- gsub("\\.","", age$player_name)
grep(".Jr$", age$player_name, value = TRUE) # 6 Jrs here.
grep("Glen Rice", age$player_name, value = TRUE)
grep(".Jr$",stat$player_name, value = TRUE)
#look through all stats for which one junior is missing for a total of 13 was missing in stats.
grep(".Jr$", salary$player_name, value = TRUE)
##### Filling all the juniors
#stat
grep("Roger Mason",stat$player_name, value = TRUE)
stat$player_name<-gsub("Roger Mason", "Roger Mason Jr", stat$player_name)
#testing salary
grep("Roger Mason",salary$player_name, value = TRUE)
salary$player_name<-gsub("Roger Mason", "Roger Mason Jr", salary$player_name)
#-------------------------
There were also, weird name spellings like Nene Hilario that was spelled as Nene, or other international players who had their country passport name instead of their NBA USA player name. For example Hedo Turkoglu as Hidayet Turkloglu. Changed all basketball player names to their USA player names. As well as changed Ron Artests name to Metta World Peace.
This was accomplished by using gusb and grepl as well as which.
#nene hilario weird spelling
grep("^Ne", stat$player_name, value = TRUE)
grep("^Nene", age$player_name, value = TRUE) # located in age
grep("^Ne", age$player_name)
age$player_name <- gsub("Nene", "Nene Hilario", age$player_name)
grep("^Nen", salary$player_name, value = TRUE)
salary$player_name <- gsub("Nenê", "Nene Hilario", salary$player_name)
There were some issues with duplicate names as well due to player movement for that year. Removed the duplicated names and categorized the player into the last team he played for that season.
This was accomplished using which, duplicated and the dplyer package using the pipe operator to group_by and filter.
#multiple trade scenarios where a player ended up on different teams
which(duplicated(stat$player_name))
sum(duplicated(stat$player_name))
stat %>% group_by(player_name) %>% filter(n() > 1)
a<-which(duplicated(stat$player_name))
#Will use last team played for when doing full join so remove duplicates
stat<- stat[-c(a),]
rownames(stat) <- NULL
After performing a full join which was joined by the players name, fixed the problem of height which was loaded in as a date instead of feet and inches.
Used grepl and gsub to change this.
#time to combine all of them
data <- full_join(salary, age, by = "player_name")
data <- full_join(data, stat, by = "player_name")
#have to fill in the N/As now.
colnames(data)[2] <- "salary"
colnames(data)[39] <- "ppg"
sum(is.na(data$salary))
which(is.na(data$salary))
sum(is.na(data$TEAM))
sum(is.na(data$ppg))
#find duplicated rows
which(duplicated(data$player_name))
#change the height data so it's not in date format
data$HEIGHT <- gsub("Jun", "6", data$HEIGHT, fixed=TRUE)
data$HEIGHT <- gsub("Jul", "7", data$HEIGHT, fixed=TRUE)
data$HEIGHT <- gsub("May", "5", data$HEIGHT, fixed=TRUE)
data$HEIGHT[which(data$HEIGHT == "11-6")] = '6-11"'
data$HEIGHT[which(data$HEIGHT == "10-6")] = '6-10"'
data$HEIGHT[which(data$HEIGHT == "9-6")] = '6-9"'
data$HEIGHT[which(data$HEIGHT == "8-6")] = '6-8"'
data$HEIGHT[which(data$HEIGHT == "7-6")] = '6-7"'
data$HEIGHT[which(data$HEIGHT == "5-6")] = '6-5"'
data$HEIGHT[which(data$HEIGHT == "4-6")] = '6-4"'
data$HEIGHT[which(data$HEIGHT == "3-6")] = '6-3"'
data$HEIGHT[which(data$HEIGHT == "2-6")] = '6-2"'
data$HEIGHT[which(data$HEIGHT == "1-6")] = '6-1"'
data$HEIGHT[which(data$HEIGHT == "6-00")] = '6-0"'
data$HEIGHT[which(data$HEIGHT == "11-5")] = '5-11"'
data$HEIGHT[which(data$HEIGHT == "10-5")] = '5-10"'
data$HEIGHT[which(data$HEIGHT == "9-5")] = '5-9"'
data$HEIGHT[which(data$HEIGHT == "8-5")] = '5-8"'
data$HEIGHT[which(data$HEIGHT == "7-5")] = '5-7"'
data$HEIGHT[which(data$HEIGHT == "1-7")] = '7-1"'
data$HEIGHT[which(data$HEIGHT == "2-7")] = '7-2"'
data$HEIGHT[which(data$HEIGHT == "3-7")] = '7-3"'
data$HEIGHT[which(data$HEIGHT == "7-00")] = '7-0"'
data$HEIGHT[which(data$HEIGHT == "6'6")] = "6'6\""
#change the symbols - to 4'2"
data$HEIGHT <- gsub("-", "'", data$HEIGHT, fixed=TRUE)
grep("[I]", data$player_name, value = TRUE)
#--------------------------#
There were several NAs in the rows so created another CSV file to load in the missing data. Information for the missing player statistics was loaded from basketball reference again and for the salary was loaded from Patricia Bender’s data archive.
Used whichis.na
miss_sal<-read.csv("missing_salary_2000.csv", header = F)
colnames(miss_sal)[1:39] <- c("player_name", "salary", "TEAM", "HEIGHT", "WEIGHT", "COLLEGE", "COUNTRY", "DRAFT.YEAR",
"DRAFT.ROUND", "DRAFT.NUMBER", "NETRTG", "player_id", "Pos", "Age", "G", "GS", "MP", "FG", "FGA",
"FG.", "X3P", "X3PA", "X3P.", "X2P", "X2PA", "X2P.", "eFG.", "FT", "FTA", "FT.", "ORB", "DRB",
"TRB", "AST", "STL", "BLK", "TOV", "PF", "ppg")
x<-which(is.na(data$salary))
data <- data[-c(x),]
data<-rbind(data,miss_sal)
The stringr package was used to format the salary column and convert it to a numeric. Inflation data was loaded in using a full_join.
#Making salary into numeric
data$salary <- str_replace(data$salary, "\\$", "")
data$salary <- str_replace_all(data$salary, ",", "")
data$salary <- as.numeric(data$salary)
rownames(data) <- NULL
Then filled in the awards section for most valuable player, All NBA team, defensive player of the year, sixth man, rookie of the year, all defensive team from land of basketball.
# make a column if they won the awards the previous year. #make columns for awards
test <- data
test["MVP"] <- "NO"
test["MIP"] <- "NO"
test["Sixth_man"] <- "NO"
test["DPOY"] <- "NO"
test["ROY"] <- "NO"
test["NBA_1"] <- "NO"
test["NBA_2"] <- "NO"
test["NBA_3"] <- "NO"
test["Rookie_1"] <- "NO"
test["Rookie_2"] <- "NO"
test["Def_1"] <- "NO"
test["Def_2"] <- "NO"
#Nba first team
a1<-grep("Tim Duncan", test$player_name)
test[a1, 48] <- "YES"
a2<-grep("Karl Malone", test$player_name)
test[a2, 48] <- "YES"
a3<-grep("Allen Iverson", test$player_name)
test[a3, 48] <- "YES"
a4<-grep("Jason Kidd", test$player_name)
test[a4,48] <- "YES"
a5<-grep("Alonzo Mourning", test$player_name)
test[a5, 48] <- "YES"
This process was then repeated going through the 2000 to 2019 NBA season.
Rbind was used to bind all the seasons together. Then the data columns were renamed.
data_2018<-read.csv("2018_data.csv")
data_2017<-read.csv("2017_data.csv")
data_2016<-read.csv("2016_data.csv")
data_2015<-read.csv("2015_data.csv")
data_2014<-read.csv("2014_data.csv")
data_2013<-read.csv("2013_data.csv")
data_2012<-read.csv("2012_data.csv")
data_2011<-read.csv("2011_data.csv")
data_2010<-read.csv("2010_data.csv")
data_2009<-read.csv("2009_data.csv")
data_2008<-read.csv("2008_data.csv")
data_2007<-read.csv("2007_data.csv")
data_2006<-read.csv("2006_data.csv")
data_2005<-read.csv("2005_data.csv")
data_2004<-read.csv("2004_data.csv")
data_2003<-read.csv("2003_data.csv")
data_2002<-read.csv("2002_data.csv")
data_2001<-read.csv("2001_data.csv")
data_2000<-read.csv("2000_data.csv")
data<-rbind(data_2019,data_2018,data_2017,data_2016,data_2015,data_2014,data_2013,data_2012,data_2011,data_2010, data_2009,
data_2008, data_2007, data_2006, data_2005, data_2004, data_2003, data_2002, data_2001, data_2000)
#Error in match.names(clabs, names(xi)) :
#names do not match previous names
colnames(data_2019) == colnames(data_2000)
names(data)
We ended up having 9015 observations. However, I decided to remove the 2019 data set since the season was still going on and I did not want to include an incomplete season with the rest of the data.
Also, ran a check for players who were injured, waived or released. Salaries would still be included on the team due to contractual obligations. (The league included stretch provisions where teams could stretch out the payment of a contract over several years this would allow teams some salary cap relief.)
Removed players who did not even attempt a single 2-point field goal. Also removed players who were not able to play more than 5 minutes per game. This reduced the number of observations to 7978. The reason why I removed the data was because I wanted to include players who played meaningful minutes on average and players who at least attempted a 2 point shot.
lessthanfive<-which(df$Games <5)
df <- df[-lessthanfive,]
lessthanfiveminGame <- which(df$MinGames < 5)
df <- df[-lessthanfiveminGame,]
lessthanoneppg<-which(df$ppg < 1)
df <- df[-lessthanoneppg,]
#decided not to use the incomplete 2019 dataset
twennineteen<-which(df$PlayerYear == 2019)
df <- df[-twennineteen,]
rownames(df) <- NULL
In the time period from 2000 to 2018, the Vancouver Grizzlies became the Memphis Grizzlies, the Seattle Supersonics became the Oklahoma City Thunder, the New Jersey Nets became the Brooklyn Nets, the Charlotte Hornets became the New Orleans Pelicans, Charlotte under the leadership of Michael Jordan got their team back and got their team back but were named the Bobcats before switching the name back to the Hornets.
I chose to change the old team names to the newer one. I used grepl and gsub to perform this task. For example from 2000 to 2007 the Seattle Supersonics team would be changed to the Oklahoma City Thunder. While, this is inaccurate it won’t affect the players data and stats but it will keep track of team win history through 2000 to 2018.
#checking team names
unique(df$team)
grepl("PHX", df$team)
grepl("PHO", df$team)
which(df$team == "PHO")
df$team <- gsub("PHX", "PHO", df$team)
#vancouver grizzlies were 2000 and 2001,
df$team <- gsub("VAN", "MEM", df$team)
#Sonics was from 2008 to 2018,
df$team <-gsub("SEA", "OKC", df$team)
#Nets moved to brooklyn in 2011
df$team <-gsub("NJN", "BRK", df$team)
df$team <-gsub("BKN", "BRK", df$team)
#Charlotte hornets 2000- 2002 then became the bobcats and got name back to be hornets again in 2014
which(df$team == "CHH") #charlote hornets
df[6817,]
df$team <-gsub("CHA", "CHO", df$team)
df$team <-gsub("CHH", "CHO", df$team)
#New Orlean Pelicans
which(df$team == "NOK")
df[4790,] # is the hornets for new orleans
df$team <-gsub("NOH", "NOP", df$team)
df$team <-gsub("NOK", "NOP", df$team)
unique(df$team)
Data Sources each with their own unique spelling for names and team symbols.
Some of the sources used different symbols for different teams. For example, the Charlotte Hornets would be shortened to CHA on one site, CHH on another source and CHO on another. The Phoenix Suns also had a similar problem where the teams symbol would be PHO and PHX.
I chose CHO for Charlotte and PHO for Phoenix. I used which, gsub, grepl and unique.
The Charlotte Hornets data set refused to load in. Ended up using data.table package to input the data for the number of wins.
setDT(data)
data[i = PlayerYear == 2018 & team == "CHO",
j = TeamWins := 36]
data[i = PlayerYear == 2017 & team == "CHO",
j = TeamWins := 36]
Changed height from feet and inches to centimeters so I will be able to do a correlation analysis for height and salary.
#want to change height to cm to calculate average joe data
df$height_ft<-gsub("7'6\"", 229, df$height_ft)
df$height_ft<-gsub("7'5\"", 226, df$height_ft)
df$height_ft<-gsub("7'4\"", 224, df$height_ft)
df$height_ft<-gsub("7'3\"", 221, df$height_ft)
df$height_ft<-gsub("7'2\"", 219, df$height_ft)
df$height_ft<-gsub("7'1\"", 216, df$height_ft)
df$height_ft<-gsub("7'0\"", 214, df$height_ft)
df$height_ft<-gsub("6'11\"", 211, df$height_ft)
df$height_ft<-gsub("6'10\"", 208, df$height_ft)
df$height_ft<-gsub("6'9\"", 206, df$height_ft)
df$height_ft<-gsub("6'8\"", 203, df$height_ft)
df$height_ft<-gsub("6'7\"", 201, df$height_ft)
df$height_ft<-gsub("6'6\"", 198, df$height_ft)
df$height_ft<-gsub("6'6", 198, df$height_ft)
df$height_ft<-gsub("6'5\"", 196, df$height_ft)
df$height_ft<-gsub("6'4\"", 193, df$height_ft)
df$height_ft<-gsub("6'3\"", 191, df$height_ft)
df$height_ft<-gsub("6'2\"", 188, df$height_ft)
df$height_ft<-gsub("6'1\"", 186, df$height_ft)
df$height_ft<-gsub("6'0\"", 183, df$height_ft)
df$height_ft<-gsub("5'11\"", 180, df$height_ft)
df$height_ft<-gsub("5'10\"", 178, df$height_ft)
df$height_ft<-gsub("5'9\"", 175, df$height_ft)
df$height_ft<-gsub("5'8\"", 173, df$height_ft)
df$height_ft<-gsub("5'7\"", 170, df$height_ft)
df$height_ft<-gsub("5'6\"", 168, df$height_ft)
df$height_ft<-gsub("5'5\"", 165, df$height_ft)
df$height_ft<-gsub("5'4\"", 163, df$height_ft)
df$height_ft<-gsub("5'3\"", 160, df$height_ft)
df <-df[,-c(1:2)]
colnames(df)[4] <- "height_cm"
str(df)
df$height_cm <- as.numeric(df$height_cm)
height_na<-which(is.na(df$height_cm))
Additional Variables added from the initial Data Wrangling Process
Number of years in the NBA Reason this was added in, is because the number of years played under the collective bargaining agreement awards players higher available salary for the more years they played in the league. The minimum for a player who played 1 year will be vastly different than a player who’s played 10 years and is getting the minimum from the team. (Under the current NBA CBA a 1 year minimum salary player could only receive 1,349,383 dollars and this contract could only be extended for 2 years while a 10 year player would receive 2,393,887 dollars the first year and this contract could be extended for up to 5 years.
Team Salary, Salary Cap, Over/Under
These 3 variables were added to compare the change of team salary and salary cap over the years and how much the team was over and under the cap.
#load in team salary want to see how much each player salary takes from total team salary
#see if they are worth it. #2004 changed to 30 teams
tmSal_18<-read.csv("hoops_hype_salaryt_2018.csv",header = F)
tmSal_18 <- tmSal_18[-c(1:2),-c(1:2,5)]
tmSal_18["PlayerYear"] <- 2018
colnames(tmSal_18)[1:2]<- c("team","tmsalary")
rownames(tmSal_18) <- NULL
tmSal <- rbind(tmSal_00,tmSal_01, tmSal_02, tmSal_03, tmSal_04, tmSal_05,
tmSal_06, tmSal_07, tmSal_08, tmSal_09, tmSal_10,
tmSal_11, tmSal_12, tmSal_13, tmSal_14, tmSal_15,
tmSal_16, tmSal_17, tmSal_18)
tmSal$team<-gsub("LA Clippers", "LAC", tmSal$team)
tmSal$team<-gsub("Vancouver Grizzlies", "MEM", tmSal$team)
tmSal$team<-gsub("Memphis", "MEM", tmSal$team)
tmSal$team<-gsub("Seattle Sonics", "OKC", tmSal$team)
tmSal$team<-gsub("Oklahoma City", "OKC", tmSal$team)
tmSal$team<-gsub("Charlotte Hornets", "CHO", tmSal$team)
tmSal$team<-gsub("Charlotte", "CHO", tmSal$team)
tmSal$team<-gsub("New Orleans", "NOP", tmSal$team)
I ended up removing state taxes from the data. The reason why is because this information would require advice from a tax specialist. Each state has their own different structure for tax brackets depending on the players income. Jock tax also applies to the NBA. Jock tax is where a player would be taxed in the city they played in. For example, there are 82 games in one season. 41 of those games would be taxed in the city the player signed with while the other 41 games would be taxed according to where the player played for away games. In order to try to maintain accuracy I did not want to use a generic state tax and apply it to salary for each player. Therefore, this data was not included because more research is needed to fully understand the Canadian and USA state tax systems.
1. The Distribution of our Data
Salary Spread
Salary distribution is right skewed, the structure of NBA contracts under the new CBA allows players to take up to 35 percent of the salary cap. Therefore, there will be a few players getting huge paydays while on average role players will be getting a much smaller salary in comparison.
Threes Made Spread
Eventhough basketball is considered a team sport. You can see from the two point and three pointer made distrbution it is right skewed as well. This is an entertainment industry where you want your best players taking the most shots and creating the most interesting news and media headlines which in terms will indirectly drive more revenue for your team.
Total Rebound Spread
Total rebounds is right skewed as well. The reasoning behind this is because of todays play style. Where one player will rebound the ball and a few of the other players will begin to race down the court to score on the fast break. Also, due to the increased spacing on the floor now only a few players will be designated inside the painted area to defend.
Turnovers Spread
Turnovers are right skewed as well. The best players hold the ball more during a game and thus the chance of turning the ball over will go up as these players will be doubled and even tripled teamed.
PlusMinus Spread
Plus Minus is normally distributed. Since, a game of basketball is about scoring more points than your opponent and limiting their number of points. There will always be a player who has a negative plus minus after if they allow more points on the floor than they scored and vice versa.
Age Spread
Age spread is a little bit right skewed. What will be interesting to see is, with modern day sports medicine and therapy whether the data will become more and more right skewed in the next 5 years as players will be able to play well into their late 30’s.
2. NBA Media Trends:
The importance of the Three
Through the media and with records being broken for 3 point attempts and makes in the latest NBA season. The 3 point shot and how it is implemented in todays game has changed the way the game has played. From a big man dominant game to a slashing guard game to now a team that emphasizes ball movement and offensive floor spacing.
By plotting a graph we can see that the average number or 2 point attempts has been gradually decreasing while the number of 3 point attempts has been increasing.
However, what we really want to know are teams now paying for players who are making three point shots.
## # A tibble: 11 x 5
## PlayerYear `mean(salary)` `median(salary)` `max(salary)` `min(salary)`
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 2002 11250000 11250000 11250000 11250000
## 2 2004 6250000 6250000 6250000 6250000
## 3 2006 13223140 13223140 13223140 13223140
## 4 2007 14611570 14611570 14611570 14611570
## 5 2008 11111110 11111110 11111110 11111110
## 6 2013 3958742 3958742 3958742 3958742
## 7 2014 9098071 9098071 9887642 8308500
## 8 2015 6852546. 6852546. 10629213 3075880
## 9 2016 10369357. 11370786 15501000 4236286
## 10 2017 14041261. 12112359 26540100 6587131
## 11 2018 24015411. 26153057 34682550 12943020
## # A tibble: 19 x 5
## PlayerYear `mean(salary)` `median(salary)` `max(salary)` `min(salary)`
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 2000 2943428. 1968750 17142858 2853
## 2 2001 3514447. 2250000 19610000 40000
## 3 2002 3566080. 2400000 22400000 48705
## 4 2003 3844233. 2530920 25200000 34946
## 5 2004 3898900. 2200000 28000000 32375
## 6 2005 4016926. 2410000 27696430 45369
## 7 2006 4208367. 2750000 20000000 24315
## 8 2007 4248684. 2645000 21000000 55000
## 9 2008 4686497. 2942040 23750000 42203
## 10 2009 4894865. 3163769 24751934 46812
## 11 2010 4810069. 3000000 23239562 43319
## 12 2011 4633681. 3000000 24806250 55718
## 13 2012 4487611. 2710628. 25244493 42009
## 14 2013 4542179. 2953122 27849000 72892
## 15 2014 4447810. 2652000 30453000 48854
## 16 2015 4491181. 2732000 23500000 48028
## 17 2016 5067255. 2900000 25000000 57726
## 18 2017 6176565. 3522560 30963450 73528
## 19 2018 6326170. 3097800 33285709 77250
If we filter and look at player salary comparison for players who made more than three 3 pointers, the average salary seems to have been decreasing gradually from the 2008 period to around the 2013 period before slowly rising up again. There was a noticeable sharp increase after the 2015 season.
If we further analyze the data and look at the All NBA teams for those seasons, we can see that from the first, second and third All-NBA teams there were really no 3 point specialists at the time. Most on the list were dominant big men or slashing guards like Kobe Bryant, Tracy McGrady and Dwyane Wade. Ray Allen, considered the best 3 point shooter after in history debatable with Reggie Miller at the time only made an All-NBA team during the 2004-2005 season. Out of those years he was the only player on the ALL-NBA team to make more than three 3’s a game during the season.
## name PlayerYear three_att three
## 26 Chris Paul 2014 3.4 1.3
## 31 Blake Griffin 2014 0.6 0.2
## 39 Russell Westbrook 2014 4.7 1.5
## 51 Kevin Durant 2014 6.1 2.4
## 82 Dwight Howard 2014 0.1 0.0
## 104 Tim Duncan 2014 0.1 0.0
## 119 David Lee 2014 0.0 0.0
## 131 Marc Gasol 2014 0.2 0.0
## 143 Kobe Bryant 2014 2.7 0.5
## 153 James Harden 2014 6.6 2.4
## 197 Carmelo Anthony 2014 5.4 2.2
## 259 Tony Parker 2014 1.0 0.4
## 269 Dwyane Wade 2014 0.6 0.2
## 377 Paul George 2014 6.3 2.3
## 389 LeBron James 2014 4.0 1.5
## 452 Kobe Bryant 2013 5.2 1.7
## 543 Tyson Chandler 2013 0.0 0.0
## 577 Rajon Rondo 2013 1.3 0.3
## 607 Russell Westbrook 2013 3.7 1.2
## 642 Dwight Howard 2013 0.1 0.0
## 662 Dirk Nowitzki 2013 3.0 1.2
## 666 Blake Griffin 2013 0.4 0.1
## 685 LeBron James 2013 3.3 1.4
## 692 Carmelo Anthony 2013 6.2 2.3
## 703 Chris Paul 2013 3.3 1.1
## 716 Tony Parker 2013 1.0 0.4
## 762 Kevin Durant 2013 4.1 1.7
## 823 Dwyane Wade 2013 1.0 0.2
## 852 Kevin Love 2013 5.1 1.1
## 881 Pau Gasol 2012 0.4 0.1
## 890 Derrick Rose 2012 4.4 1.4
## 906 Al Horford 2012 0.1 0.0
## 928 Amar'e Stoudemire 2012 0.4 0.1
## 956 Russell Westbrook 2012 3.0 0.9
## 1009 Dwight Howard 2012 0.1 0.0
## 1042 Zach Randolph 2012 0.3 0.1
## 1095 Kobe Bryant 2012 4.9 1.5
## 1145 Kevin Durant 2012 5.2 2.0
## 1215 LaMarcus Aldridge 2012 0.2 0.0
## 1218 LeBron James 2012 2.4 0.9
## 1243 Dwyane Wade 2012 1.1 0.3
## 1254 Dirk Nowitzki 2012 3.4 1.3
## 1271 Chris Paul 2012 3.6 1.3
## 1289 Manu Ginobili 2012 3.7 1.5
## 1328 Kevin Durant 2011 5.3 1.9
## 1385 Carmelo Anthony 2011 3.3 1.2
## 1401 Joe Johnson 2011 4.2 1.2
## 1457 LeBron James 2011 3.5 1.2
## 1467 Steve Nash 2011 2.7 1.1
## 1468 Dirk Nowitzki 2011 2.3 0.9
## 1516 Andrew Bogut 2011 0.1 0.0
## 1535 Deron Williams 2011 4.9 1.6
## 1545 Amar'e Stoudemire 2011 0.3 0.1
## 1567 Dwight Howard 2011 0.1 0.0
## 1606 Pau Gasol 2011 0.0 0.0
## 1631 Kobe Bryant 2011 4.3 1.4
## 1676 Dwyane Wade 2011 2.7 0.8
## 1684 Brandon Roy 2011 2.4 0.8
## 1708 Tim Duncan 2011 0.1 0.0
## 1797 Carmelo Anthony 2010 2.7 0.9
## 1799 Shaquille O'Neal 2010 0.0 0.0
## 1809 LeBron James 2010 5.1 1.7
## 1838 Tim Duncan 2010 0.1 0.0
## 1849 Tony Parker 2010 0.6 0.2
## 1900 Dwight Howard 2010 0.1 0.0
## 1970 Chauncey Billups 2010 5.6 2.2
## 1982 Paul Pierce 2010 3.7 1.5
## 1999 Brandon Roy 2010 3.4 1.1
## 2015 Dwyane Wade 2010 3.2 0.9
## 2025 Kobe Bryant 2010 4.1 1.4
## 2074 Dirk Nowitzki 2010 1.5 0.6
## 2114 Chris Paul 2010 2.8 1.2
## 2116 Pau Gasol 2010 0.1 0.0
## 2144 Kevin Garnett 2009 0.1 0.0
## 2178 Deron Williams 2009 3.3 1.0
## 2200 Carlos Boozer 2009 0.0 0.0
## 2215 Chris Paul 2009 2.3 0.8
## 2239 Kobe Bryant 2009 4.1 1.4
## 2256 Tim Duncan 2009 0.0 0.0
## 2273 Yao Ming 2009 0.0 0.0
## 2275 Tracy McGrady 2009 3.3 1.3
## 2283 Paul Pierce 2009 3.8 1.5
## 2293 Manu Ginobili 2009 4.8 1.6
## 2449 Steve Nash 2009 3.3 1.5
## 2456 LeBron James 2009 4.7 1.6
## 2497 Dwight Howard 2009 0.0 0.0
## 2524 Dirk Nowitzki 2009 2.1 0.8
## 2550 Tim Duncan 2008 0.1 0.0
## 2637 Chauncey Billups 2008 4.4 1.8
## 2653 Kobe Bryant 2008 5.1 1.8
## 2706 Chris Bosh 2008 0.4 0.1
## 2725 Yao Ming 2008 0.0 0.0
## 2744 Carmelo Anthony 2008 2.1 0.8
## 2749 Gilbert Arenas 2008 6.0 1.7
## 2786 Dirk Nowitzki 2008 2.9 1.0
## 2801 LeBron James 2008 4.8 1.5
## 2813 Tracy McGrady 2008 4.5 1.3
## 2820 Kevin Garnett 2008 0.2 0.0
## 2834 Steve Nash 2008 4.7 2.2
## 2918 Dwight Howard 2008 0.0 0.0
## 2919 Dwyane Wade 2008 1.5 0.4
## 2939 Tim Duncan 2007 0.1 0.0
## 2954 Chauncey Billups 2007 4.5 1.6
## 2967 LeBron James 2007 4.0 1.3
## 3068 Gilbert Arenas 2007 7.9 2.8
## 3074 Steve Nash 2007 4.5 2.1
## 3118 Shawn Marion 2007 3.2 1.0
## 3141 Ben Wallace 2007 0.1 0.0
## 3149 Kobe Bryant 2007 5.2 1.8
## 3179 Yao Ming 2007 0.0 0.0
## 3189 Carmelo Anthony 2007 2.3 0.6
## 3244 Dwyane Wade 2007 1.5 0.4
## 3254 Shaquille O'Neal 2007 0.0 0.0
## 3267 Allen Iverson 2007 3.0 1.0
## 3272 Dirk Nowitzki 2007 2.2 0.9
## 3332 Elton Brand 2007 0.0 0.0
## 3400 Kobe Bryant 2006 6.5 2.3
## 3422 Tracy McGrady 2006 5.0 1.6
## 3456 Ben Wallace 2006 0.0 0.0
## 3460 LeBron James 2006 4.8 1.6
## 3506 Ray Allen 2006 8.4 3.4
## 3531 Tim Duncan 2006 0.1 0.0
## 3570 Shaquille O'Neal 2006 0.0 0.0
## 3571 Gilbert Arenas 2006 6.8 2.5
## 3635 Allen Iverson 2006 3.1 1.0
## 3694 Dirk Nowitzki 2006 3.3 1.4
## 3724 Steve Nash 2006 4.3 1.9
## 3735 Dwyane Wade 2006 1.0 0.2
## 3740 Kevin Garnett 2006 0.4 0.1
## 3744 Shawn Marion 2006 3.6 1.2
## 3780 Ben Wallace 2005 0.1 0.0
## 3841 Yao Ming 2005 0.0 0.0
## 3844 Kobe Bryant 2005 5.9 2.0
## 3893 Baron Davis 2005 7.7 2.6
## 3899 Tim Duncan 2005 0.1 0.0
## 3920 Tracy McGrady 2005 5.6 1.8
## 3939 Jermaine O'Neal 2005 0.1 0.0
## 3999 Sam Cassell 2005 1.7 0.5
## 4030 Peja Stojakovic 2005 6.6 2.6
## 4042 Kevin Garnett 2005 0.3 0.1
## 4054 Metta World Peace 2005 2.4 1.0
## 4056 Dirk Nowitzki 2005 2.9 1.2
## 4058 Shaquille O'Neal 2005 0.0 0.0
## 4084 Jason Kidd 2005 5.4 2.0
## 4094 Michael Redd 2005 3.9 1.4
Other 3 point big shot takers were Dirk Nowitzki and Chauncey Billups. However, the number of 3’s made was not over 3 compared to todays NBA. With less emphasis in ball movement and spacing many of these 3’s were perhaps contested in the past.
Then there is the spike in salary after the 2015 season. While this might have some contribution to the Warriors winning their first championship and becoming the first jump shooting team to do so, there were many other factors that increased salary after the 2015 NBA season. We can confirm that this was a spike across all variables related to salary if we take a look at number of 2 pointers made by a player compared to their salary. As well as salaries for players who made less than three 3’s.
During the 2016-2017 free agency there were more buyers in terms of teams bidding for the services of free agents than there were sellers, players who were trying to sell their services to a new team. Notable free agents at the time were Lebron James, Anthony Davis and Kevin Durant. Overall, teams spent a record of 2 billion dollars on cap.
NBA player salary is tied to BRI, basketball related income, where the players association have bargained that players share 50 percent of all basketball related income.
There was an increase in salary cap overall because of the massive 24 billion dollar TV deal the NBA signed.
Other factors that might be included are the rise of social media in those years as well as more globalization of the game outside North America.
Correlation Player and Team Salary
## [1] 0.676354
Correlation Team Salary and Salary Cap
## [1] 0.7865825
Correlation Player Salary and Salary Cap
## [1] 0.8939345
Therefore, as a team should we really be looking at the 3 point shot and what teams are paying 3 point shooters who make more 3s? If we do a comparison now of team wins to 3 point shot we can see a trend forming. We can see a 0.89 correlation between player salary and salary cap.
## # A tibble: 11 x 5
## PlayerYear meanW `median(TeamWins,… `max(TeamWins, n… `min(TeamWins, n…
## <int> <dbl> <dbl> <int> <int>
## 1 2002 41 41 41 41
## 2 2004 55 55 55 55
## 3 2006 35 35 35 35
## 4 2007 31 31 31 31
## 5 2008 32 32 32 32
## 6 2013 47 47 47 47
## 7 2014 42.5 42.5 51 34
## 8 2015 67 67 67 67
## 9 2016 63.3 73 73 44
## 10 2017 54.9 55 67 36
## 11 2018 57.4 58 65 48
## # A tibble: 19 x 5
## PlayerYear meanW `median(TeamWins,… `max(TeamWins, n… `min(TeamWins, n…
## <int> <dbl> <dbl> <int> <int>
## 1 2000 40.1 42 67 15
## 2 2001 40.6 45 58 15
## 3 2002 40.9 42 61 21
## 4 2003 40.8 43 60 17
## 5 2004 41.0 41 61 21
## 6 2005 40.4 43 62 13
## 7 2006 41.1 41 64 21
## 8 2007 40.6 40 67 22
## 9 2008 40.6 41 66 15
## 10 2009 40.8 41 66 17
## 11 2010 40.1 42 61 12
## 12 2011 40.9 42 62 17
## 13 2012 32.8 35 50 7
## 14 2013 40.8 41 66 20
## 15 2014 40.9 43 62 15
## 16 2015 40.6 40 67 16
## 17 2016 40.6 42 73 10
## 18 2017 40.6 41 67 20
## 19 2018 40.2 44 65 21
As we can see entering past the 2015 NBA season there is a trend. Teams that had players who made more than three 3’s on their team won on average 50 or more games. While teams who had players who made less than three 3’s won an average of 40 games. The dip to 32 games is during the NBA shortened lock out season after the 2011 CBA where only 66 games were played.
So as a team knowing this trend we should look for these types of players to add onto our team.
Teams are going small ball there are no more 7ft players in the league anymore.
What we want to know is if the NBA is getting smaller. Are players like Lebron James(6ft9”), Draymond Green(6ft7”), PJ Tucker(6ft7”) going to replace todays big men? Should we not even bother to look at players over 7ft anymore?
## PlayerYear height salary
## 1 2000 200.9356 2880067
## 2 2001 200.9545 3398885
## 3 2002 201.5609 3487920
## 4 2003 201.6292 3783693
## 5 2004 201.3472 3817195
## 6 2005 201.4734 3911320
## 7 2006 201.0049 4091303
## 8 2007 200.6675 4140794
## 9 2008 200.8610 4574357
## 10 2009 201.2897 4715669
## 11 2010 200.9824 4829387
## 12 2011 201.3957 4560710
## 13 2012 200.9750 4348460
## 14 2013 200.8822 4447435
## 15 2014 200.9120 4402763
## 16 2015 200.8978 4430783
## 17 2016 201.1982 5097982
## 18 2017 201.1186 6218880
## 19 2018 200.5735 6450838
Height stayed relatively the same. However, maybe the shorter players became much taller which pulled up this average. Let us look at the number of 7 footers in the NBA throughout the 19 seasons.
## # A tibble: 19 x 2
## PlayerYear count_name
## <int> <int>
## 1 2000 37
## 2 2001 37
## 3 2002 40
## 4 2003 42
## 5 2004 37
## 6 2005 42
## 7 2006 38
## 8 2007 33
## 9 2008 37
## 10 2009 40
## 11 2010 37
## 12 2011 40
## 13 2012 37
## 14 2013 31
## 15 2014 38
## 16 2015 35
## 17 2016 42
## 18 2017 47
## 19 2018 44
Number of 7fters has stayed relatively the same over the period and has even seemed to increase. There were 47 plaeyrs that were 7 ft and taller in 2017. Therefore, overall the league still has players listed at 7ft. This is perhaps because media coverage do not cover these teams as much and thus creating the misconception of a smaller NBA.
Does PlusMinus really matter
Before advanced NBA analytics teams looked mainly at the points scored by a player as a reference point to start with salary. The rule of thumb was the more points scored the more exciting the player was to watch, the more fans thus the more revenue and hopefully this would equate to more wins as well. However, as the NBA started adopting player analytics some of these high scorers were found to be taking inefficient shots. Many players fell out of favor in the league such as Josh Smith and Michael Carter-Williams.
One advanced statistic that we should look more closely to is plus minus. The determination of how many points a player makes the other 4 players on the floor better.
PlusMinus and Salary Correlation
## [1] 0.2460146
We can see kind of a cluster in the middle between - 15 and 15 PlusMinus. Players getting the most salary have a plus minus between -5 and 5.
If we look at the team wins compared to plus minus we can see there is one player that has a plus minus of -40 but still won 60 plus games.
Alex Acker Outlier
## name team salary PlayerYear
## 1 Alex Acker DET 398762 2006
Now for the outlier where the player won less than 40 games but has a huge plus minus of over 20.
## name team salary PlayerYear
## 1 Hamed Haddadi MEM 1572221 2009
Who is Hamed Haddadi? And should we build a championship team around this player with such a good plus minus and get him away from his bad team?
The problem with plus minus.
Plus minus can be contributed to the competition the player is playing against. For example, a role player who plays with the second unit would be competing against other second unit teams and might have a plus minus better than the player who is starting ahead of them. Another problem with plus minus is the rotation teams use. The player might be stuck with an offensively challenge or defensively challenged player in the teams rotation who could be dragging their plus minus down.
## name salary PlusMinus PlayerYear
## 1 Othyus Jeffers 104034 33.5 2014
## 2 Pierre Jackson 191810 32.4 2017
## 3 Mike James 277566 29.7 2012
## 4 Chris McCullough 1191480 28.2 2017
## 5 Fred Vinson 485000 27.9 2000
## 6 Keith Appling 61775 25.8 2016
## 7 Ricky Davis 939360 22.7 2001
## 8 Hamed Haddadi 1572221 21.2 2009
## 9 Troy Daniels 158587 20.6 2014
## 10 Shavlik Randolph 306036 20.4 2014
## 11 Chris Johnson 508586 20.0 2013
## 12 Trajan Langdon 1394280 19.9 2000
## 13 JaVale McGee 1403611 19.5 2017
## 14 Draymond Green 14260870 18.0 2016
## 15 Ben Wallace 14500000 17.7 2009
## 16 Stephen Curry 11370786 17.7 2016
## 17 Stephen Curry 12112359 17.7 2017
## 18 Lamar Patterson 242963 17.7 2017
## 19 Quincy Miller 398393 17.6 2015
## 20 Brian Cardinal 465850 17.3 2002
Many of these top 20 PlusMinus players an average fan probably has never even heard about. This is the problem with PlusMinus, a role player or an unknown can play meaningless minutes and still have a great plus minus.
## name salary PlusMinus PlayerYear
## 1 Troy Daniels 158587 20.6 2014
## 2 Draymond Green 14260870 18.0 2016
## 3 Ben Wallace 14500000 17.7 2009
## 4 Stephen Curry 11370786 17.7 2016
## 5 Stephen Curry 12112359 17.7 2017
## 6 Andrew Bogut 12972973 17.0 2015
## 7 Tim Duncan 14260641 16.5 2005
## 8 Kevin Garnett 23750000 16.4 2008
## 9 Stephen Curry 10629213 16.4 2015
## 10 Draymond Green 915243 16.1 2015
## 11 Manu Ginobili 6603500 15.9 2005
## 12 Kyle Lowry 1011720 15.9 2007
## 13 Kevin Durant 26540100 15.7 2017
## 14 Zaza Pachulia 2898000 15.7 2017
## 15 Lamar Odom 14148596 15.4 2009
## 16 Klay Thompson 3075880 15.3 2015
## 17 Draymond Green 15330435 15.2 2017
## 18 Delonte West 3850000 15.0 2009
## 19 Chris Paul 22868827 14.9 2017
## 20 LeBron James 14410581 14.8 2009
The use of plus minus becomes better when we add in a minimum of 15 minutes requirement played. However, is Troy Daniels ultimately better than Lebron James just because his plus minus is higher and his salary is just a fraction of Lebron James?
Therefore, while plus minus has its uses in determine whether we want a player on our team we should use it only as an initial starting point in salary talks with free agents. In a sport where it’s about using the correct match ups to exploit the other team, it is important to wary of this statistic as well as other advanced analytic statistics that are being developed when trying to make decisions on player data.
Top College Programs Being Paid by the NBA
By looking at the top overall salary paid schools we can look at which colleges we should be considering when drafting for our team.
## # A tibble: 11 x 3
## school count_school salary
## <fct> <int> <dbl>
## 1 None 1285 7407695183
## 2 Kentucky 261 1220596079
## 3 Duke 252 1308479631
## 4 North Carolina 241 1182185814
## 5 UCLA 206 947159451
## 6 Kansas 194 840093830
## 7 Arizona 190 995023686
## 8 Connecticut 184 1038212650
## 9 Florida 143 817700613
## 10 Georgia Tech 129 718884989
## 11 Texas 116 627431018
When looking at the top schools being paid overall for the last 19 seasons, we see familiar names that are usually the top basketball programs in the United States. (Kentucky, Duke, North Carolina, UCLA, Kansas)
The none category includes players that did not attend college in the United States and players who were drafted outside the United States.
Look at these top 3 schools let us see which position was drafted the most.
## # A tibble: 5 x 2
## Pos count_pos
## <fct> <int>
## 1 PF 177
## 2 SF 158
## 3 SG 154
## 4 C 138
## 5 PG 115
Interestingly, the power forward position was drafted the most. There is a big discrepancy from power forward and point guard in these top schools. Which makes sense, most of the time these college teams will scout taller players in high school. There is a saying in basketball: “Height can’t be taught.” It is the belief that if a NBA team or a college school can draft a player who is tall but can somewhat play the game they could be transformed into a superstar. All teams are scouting for this type of player. The New Orleans Pelicans, Minnesota Timberwolves and Milwaukee Bucks were all teams that were extremely lucky by drafting Anthony Davis, Karl Anthony-Towns and Giannis Antetokounmpo.
Lets look at which teams have drafted the most from these schools over the 19 years.
## # A tibble: 10 x 2
## team count_team
## <fct> <int>
## 1 CHO 46
## 2 LAC 40
## 3 DET 36
## 4 BOS 32
## 5 DAL 32
## 6 WAS 32
## 7 MIL 31
## 8 PHO 31
## 9 CHI 30
## 10 NOP 28
While, Charlotte, Clippers and Detroit have been doing relatively well recently in the NBA in terms of win/loss. In the past these teams have been horrible. Charlotte in 2012 during the shortened NBA lock out season won only 7 games out of 66 possible games. While, the Clippers under Donald Sterling’s management was stuck in a never ending cycle of tanking games to get lottery picks before the addition of Chris Paul and Blake Griffin.
Top International players
There has been many international gems over the years. Most notable names are Dirk Nowitziki, Steve Nash, Peja Stojakovic and most recently in the 2019 season the rookie of the year candidate Luka Doncic. The game is becoming more global with countries now investing more resources to develop basketball leagues and programs for their younger generation. With this trend we want to look at the data and see where NBA teams should be sending their scouts abroad to be able to find the next potential international NBA star.
## # A tibble: 10 x 3
## country count_country salary
## <fct> <int> <dbl>
## 1 France 102 548647323
## 2 Spain 68 515601566
## 3 Brazil 62 329170240
## 4 Germany 29 278303863
## 5 Argentina 55 262352074
## 6 Turkey 45 261965589
## 7 Slovenia 52 199606417
## 8 Lithuania 27 199502355
## 9 Italy 32 197172659
## 10 Russia 32 180585067
We can see that France and Spain have received the most salary over the past 19 seasons. If we look at the Euro League championships, Spain and France are usually the top contenders every season. With Spain now developing a great Basketball League called the Liga ACB. With Luka Doncic playing in Real Madrid before joining the NBA.
The benefit of these European leagues is they allow players at the tender age of 15 to play professional basketball. This arguably can develop a player better for a professional career in basketball before he enters the NBA for the following reason:
The player is already playing with grown men
Develops the concept of team basketball
Is not enticed by shoe signings and American business model commercialization and brand building
More adjusted to a professional athletes life with the travel and the fans
While it is not in this scope as the NBA league becomes more globalized, it would be interesting in the future to do analysis and follow young players who transitioned from their teens in the European league to the NBA. (Ricky Rubio and Luka Doncic.)
The number 1 overall pick vs a second round Gem.
How important has been smart drafting in the second-round vs tanking to get the number one pick.
Notable names that show up in this category are Lebron James, Shaquille O’Neal, Tim Duncan and Allen Iverson. Looking at the salary spread vs the 2nd round gems, the 1st rounders got paid more with the highest salary accumulated being 200 million vs the 2nd round drafts of just getting paid 60 million salary accumulated. Notable names are Ben Wallace, Draymond Green and Marc Gasol. What is interesting from this chart is most of these players from the second round actually were known for their defense. Second rounders yielded 4 more defensive players than the first overall pick.
Over the 19 seasons, winning teams have spent over the cap on average while losing teams have stayed relatively close to the cap.
The best teams vs the worst teams
It is important to note that even if the team is not good and is trying to tank for the number 1 draft pick overall it must still spend closely to the yearly cap. Failing to do so the remaining cap will be spread amongst the current players on the active roster on a pro rata basis.
General managers thus elect to sign players to huge one year contracts just to meet the salary cap. By doing this it allows them future flexibility to sign in potential superstars when they enter the free agent market in the future.
Good teams also on average have a higher paid star player than bad teams.
Best vs the Worst team in the last 19 seasons
The 2016 Golden Stat Warriors had the greatest team success for a regular season in NBA history. They won 73 games but unfortunately was not able to claim the title. At the other end of the spectrum, the Charlotte Bobcats(Hornets) only won 7 games during the 2012 season.
## # A tibble: 14 x 6
## # Groups: salary [14]
## name salary ppg EFG t_reb PlusMinus
## <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 Klay Thompson 15501000 22.1 0.569 3.8 14.7
## 2 Draymond Green 14260870 14 0.554 9.5 18
## 3 Andrew Bogut 12000000 5.4 0.625 7 13.9
## 4 Andre Iguodala 11710456 7 0.544 4 13.2
## 5 Stephen Curry 11370786 30.1 0.631 5.4 17.7
## 6 Anderson Varejao 10158574 2.6 0.435 2.7 2.6
## 7 Shaun Livingston 5543725 6.3 0.531 2.2 7
## 8 Harrison Barnes 3873398 11.7 0.531 4.9 10.5
## 9 Marreese Speights 3815500 7.1 0.452 3.3 1.2
## 10 Leandro Barbosa 2500000 6.4 0.519 1.7 2.4
## 11 Festus Ezeli 2008748 7 0.54 5.6 14.6
## 12 Brandon Rush 1270964 4.2 0.542 2.5 -0.9
## 13 Ian Clark 947276 3.6 0.5 1 -7.3
## 14 James Michael McAdoo 845059 2.9 0.55 1.4 -15.9
## total TopPaid NameTopSal highscore NameTopPpg efficient
## 1 95806356 15501000 Klay Thompson 30.1 Stephen Curry 0.6311881
## NameTopEFG HighPlusMinus NameTopPM LeastPaid
## 1 Stephen Curry 18 Draymond Green 845059
## NameLowSal AvgSal tmsalary salcap OverUnder wins
## 1 James Michael McAdoo 6843311 93669566 70000000 1.338137 73
If we closely look at the Golden State Warriors salary spread for that year. Their MVP was only their 5th highest paid player. Stephen Curry’s previous contract was at a discount due to his numerous ankle injuries before. Both Klay Thompson and Draymond Green still did not have as many years in the league and were unable to unlock a higher salary per the rules of the CBA. The warriors also did a good job negotiating their salaries. Harrison Barnes was also still on his rookie salary.
The warriors also did a good job in their team drafting.
## name salary ppg EFG
## 104 Antawn Jamison 2503800 19.6 0.4715909
## 1564 Jason Richardson 2607360 15.6 0.4612676
## 1746 JR Bremer 563679 3.3 0.3295455
## 4732 Stephen Curry 2913840 18.6 0.5492958
## 5358 Klay Thompson 2222160 16.6 0.5102041
## 5813 Harrison Barnes 2923920 9.5 0.4482759
For that Charlotte team, Michael Jordan at the time was a relatively new owner, who tried to run too many jobs of the team by himself. This led to bad drafting choices as well as poor leadership choices within the organization.
## # A tibble: 14 x 6
## # Groups: salary [14]
## name salary ppg EFG t_reb PlusMinus
## <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 Corey Maggette 10262069 15 0.402 3.9 -15.4
## 2 Tyrus Thomas 7305765 5.6 0.361 3.7 -14.4
## 3 DeSagana Diop 6925400 1.1 0.375 3.1 -19.1
## 4 Matt Carroll 3900000 2.7 0.367 1.1 -11.9
## 5 DJ Augustin 3236470 11.1 0.441 2.3 -14.7
## 6 Bismack Biyombo 2798040 5.2 0.455 5.8 -16
## 7 Eduardo Najera 2750000 2.6 0.448 2.3 -7.2
## 8 Reggie Williams 2500000 8.3 0.481 2.8 -18.4
## 9 Kemba Walker 2356320 12.1 0.414 3.5 -14.3
## 10 Gerald Henderson 2250600 15.1 0.466 4.1 -13.7
## 11 DJ White 2001167 6.8 0.492 3.6 -15.3
## 12 Byron Mullens 1288200 9.3 0.440 5 -12.4
## 13 Derrick Brown 854389 8.1 0.532 3.6 -13
## 14 Cory Higgins 473604 3.9 0.345 0.9 -12.9
## total TopPaid NameTopSal highscore NameTopPpg efficient
## 1 48902024 10262069 Corey Maggette 15.1 Gerald Henderson 0.531746
## NameTopEFG HighPlusMinus NameTopPM LeastPaid NameLowSal
## 1 Derrick Brown -7.2 Eduardo Najera 473604 Cory Higgins
## AvgSal tmsalary salcap OverUnder wins
## 1 3493002 57902024 58044000 0.997554 7
The Bobcats for that year lost 23 games in a row after their last win in March. Their best player at the time was still a relatively new player to the league Kemba Walker. Looking at the salary spread this team was clearly made to just tank. If you look at their highest paid players, this team clearly was just trying to meet the salary cap and was rebuilding. Many teams have followed this trend and if a team wants to rebuild and reset their team. This type of formula might work, however the length of time and how much the local fan base will have to suffer during those years should be put under careful consideration as well.
What would the most average player look like.
## height weight salary plusMinus age games start
## 1 200.9356 222.0764 2880067 -1.1627685 27.74702 58.84248 28.57518
## 2 200.9545 222.6364 3398885 -1.2510101 27.72727 59.10859 29.96465
## 3 201.5609 224.6878 3487920 -0.9461929 27.18782 58.99239 29.94162
## 4 201.6292 224.7008 3783693 -0.9245524 27.12788 60.20716 30.38875
## 5 201.3472 224.0000 3817195 -0.9489637 27.13731 59.93782 30.67098
## 6 201.4734 224.2488 3911320 -1.5695652 26.94928 58.76570 29.57729
## 7 201.0049 223.2469 4091303 -1.2205379 26.54279 59.48166 30.03912
## 8 200.6675 222.7530 4140794 -1.2315914 26.44893 58.72922 29.19002
## 9 200.8610 222.3325 4574357 -1.1846154 26.84367 59.01985 29.46402
## 10 201.2897 223.0907 4715669 -1.4602015 26.57683 59.11839 30.14106
## 11 200.9824 222.8643 4829387 -1.1979899 26.63065 59.36181 29.98744
## 12 201.3957 224.0192 4560710 -1.2580336 26.63309 59.46043 29.47242
## 13 200.9750 222.8636 4348460 -1.7250000 26.58182 46.56591 22.44545
## 14 200.8822 222.1455 4447435 -1.1697460 26.74134 58.67206 28.35566
## 15 200.9120 221.4718 4402763 -1.0440181 26.56208 56.99097 27.74266
## 16 200.8978 221.3244 4430783 -1.5308889 26.68667 56.22222 26.72444
## 17 201.1982 221.5248 5097982 -1.4074324 26.71171 57.95045 27.69369
## 18 201.1186 220.4295 6218880 -1.3767338 26.44743 57.69128 27.50336
## 19 200.5735 218.2668 6450838 -1.6067227 26.18697 54.02731 25.83193
## mins fg fg_att fg_per three three_att three_per
## 1 21.36683 3.163962 7.130549 0.4365386 0.4152745 1.196897 0.3469591
## 2 22.04369 3.172980 7.226768 0.4314876 0.4270202 1.214646 0.3515593
## 3 22.13680 3.268528 7.395939 0.4361455 0.4573604 1.315990 0.3475410
## 4 21.92558 3.178517 7.233504 0.4312024 0.4498721 1.297954 0.3466010
## 5 22.43523 3.219689 7.393523 0.4314557 0.4694301 1.368135 0.3431168
## 6 22.23430 3.242995 7.316908 0.4368336 0.4985507 1.423430 0.3502461
## 7 22.09927 3.220049 7.142787 0.4449107 0.5051345 1.422005 0.3552270
## 8 21.78147 3.243468 7.123278 0.4489531 0.5399050 1.519952 0.3552118
## 9 21.88462 3.283623 7.260794 0.4452385 0.5727047 1.604963 0.3568336
## 10 22.30428 3.368766 7.394207 0.4500620 0.5881612 1.620907 0.3628594
## 11 22.20503 3.392211 7.397990 0.4549261 0.5753769 1.642714 0.3502600
## 12 21.86355 3.294005 7.226859 0.4536585 0.5693046 1.598801 0.3560822
## 13 21.37068 3.148636 7.088636 0.4374295 0.5572727 1.616364 0.3447694
## 14 21.42471 3.220785 7.195150 0.4430161 0.6217090 1.771824 0.3508863
## 15 21.31603 3.255305 7.228217 0.4441381 0.6706546 1.890745 0.3547039
## 16 21.10867 3.213556 7.233111 0.4399928 0.6842222 1.974444 0.3465391
## 17 21.10946 3.279054 7.300450 0.4472961 0.7204955 2.056081 0.3504217
## 18 20.98277 3.321477 7.314989 0.4507245 0.8192394 2.313423 0.3541244
## 19 20.86660 3.341807 7.363445 0.4518979 0.8915966 2.509244 0.3553248
## two two_att two_per efg ft ft_att ft_per
## 1 2.751551 5.931981 0.4533731 0.4636205 1.614797 2.168258 0.7447441
## 2 2.744192 6.006566 0.4447630 0.4589386 1.648737 2.220202 0.7426069
## 3 2.808376 6.078426 0.4528598 0.4633263 1.612183 2.154569 0.7482625
## 4 2.725064 5.936829 0.4467322 0.4589927 1.635806 2.171355 0.7533569
## 5 2.749223 6.030570 0.4465291 0.4599201 1.655181 2.217098 0.7465529
## 6 2.745652 5.896135 0.4558527 0.4670815 1.766184 2.355556 0.7497949
## 7 2.713692 5.719560 0.4639777 0.4762883 1.755990 2.367726 0.7416357
## 8 2.705226 5.605463 0.4710350 0.4836586 1.746793 2.325653 0.7510979
## 9 2.708189 5.654839 0.4662997 0.4822767 1.661042 2.228040 0.7455173
## 10 2.779597 5.770277 0.4695675 0.4879611 1.702519 2.234509 0.7619209
## 11 2.816834 5.754020 0.4830450 0.4916816 1.676131 2.223367 0.7538705
## 12 2.724460 5.628537 0.4743616 0.4914891 1.623501 2.148681 0.7555804
## 13 2.590455 5.472045 0.4634083 0.4749558 1.461364 1.953636 0.7480223
## 14 2.603233 5.422171 0.4720255 0.4849253 1.447575 1.937644 0.7470799
## 15 2.584424 5.334989 0.4844292 0.4899395 1.517156 2.014221 0.7532220
## 16 2.530889 5.258444 0.4724925 0.4866008 1.461333 1.960222 0.7454937
## 17 2.559910 5.245045 0.4820120 0.4949224 1.518919 2.022973 0.7508350
## 18 2.503803 5.000447 0.4925256 0.5044656 1.513199 1.970470 0.7679382
## 19 2.448950 4.857143 0.4964841 0.5103584 1.409874 1.842437 0.7652223
## o_reb d_reb t_reb assist steal block TO
## 1 1.1193317 2.684964 3.805489 1.926969 0.6947494 0.4441527 1.308115
## 2 1.1022727 2.782576 3.886616 1.927273 0.7131313 0.4750000 1.305303
## 3 1.1390863 2.761929 3.899239 1.955838 0.7104061 0.4794416 1.269543
## 4 1.1153453 2.731202 3.843223 1.909719 0.7150895 0.4659847 1.291816
## 5 1.1341969 2.789637 3.920466 1.946891 0.7300518 0.4608808 1.335233
## 6 1.1128019 2.739372 3.853382 1.914010 0.6917874 0.4492754 1.279710
## 7 1.0273839 2.715648 3.739120 1.831051 0.6562347 0.4283619 1.264792
## 8 1.0049881 2.682423 3.686698 1.876960 0.6539192 0.4121140 1.311639
## 9 1.0317618 2.776427 3.806700 1.949132 0.6625310 0.4344913 1.232506
## 10 1.0423174 2.806297 3.847355 1.875819 0.6690176 0.4632242 1.237028
## 11 1.0288945 2.821608 3.848995 1.907035 0.6600503 0.4555276 1.243970
## 12 1.0074341 2.775300 3.779376 1.876499 0.6565947 0.4510791 1.229976
## 13 1.0027273 2.702045 3.701364 1.835682 0.6779545 0.4477273 1.238182
## 14 0.9921478 2.727945 3.721247 1.934180 0.6866051 0.4533487 1.228176
## 15 0.9629797 2.783070 3.747178 1.915124 0.6717833 0.4164786 1.249661
## 16 0.9446667 2.802889 3.741111 1.905333 0.6786667 0.4097778 1.198889
## 17 0.9121622 2.896622 3.803153 1.929505 0.6826577 0.4344595 1.209459
## 18 0.8870246 2.876957 3.762640 1.930425 0.6657718 0.4105145 1.157271
## 19 0.8352941 2.870798 3.704202 1.993277 0.6739496 0.4115546 1.181513
## fouls ppg PlayerYear Type
## 1 2.126730 8.356086 2000 Avg
## 2 2.092424 8.421970 2001 Avg
## 3 1.990609 8.605330 2002 Avg
## 4 2.036829 8.443223 2003 Avg
## 5 2.034197 8.566062 2004 Avg
## 6 2.151691 8.751691 2005 Avg
## 7 2.141565 8.699022 2006 Avg
## 8 2.047743 8.775534 2007 Avg
## 9 1.960794 8.801737 2008 Avg
## 10 1.987406 9.029723 2009 Avg
## 11 1.983668 9.028141 2010 Avg
## 12 1.943405 8.773621 2011 Avg
## 13 1.779545 8.311364 2012 Avg
## 14 1.800231 8.513395 2013 Avg
## 15 1.869074 8.694808 2014 Avg
## 16 1.807778 8.568444 2015 Avg
## 17 1.811486 8.797523 2016 Avg
## 18 1.768680 8.970694 2017 Avg
## 19 1.753782 8.973319 2018 Avg
## name height weight salary plusMinus age games start
## 1 AvgJoe 201.0873 222.5623 4399392 -1.274556 26.81424 57.84977 28.6163
## mins fg fg_att fg_per three three_att three_per
## 1 21.7084 3.25418 7.261427 0.4429425 0.5806992 1.650448 0.3515404
## two two_att two_per efg ft ft_att ft_per
## 1 2.673354 5.61071 0.4679881 0.4806002 1.601489 2.132454 0.7511976
## o_reb d_reb t_reb assist steal block TO fouls
## 1 1.021201 2.775143 3.794608 1.912669 0.681629 0.4422839 1.251199 1.951981
## ppg
## 1 8.68851
Looking through the years we can confirm again the trend that more teams are attempting more 3 pointers compared to two pointers. Also, overall salary has been increasing at a much faster pace. This is due to the contractual talks betweent he owners and players. With more money coming in this increases baskeball related revenue and this increases the salary cap.
Now lets make a comparison of the average joe to different types of players and award winners in the NBA. From this we perhaps can develop a threshold or a reference point for salary talks.
Comparing Avg Joe salary and production to:
A superstar
Tim Duncan vs Avg Joe
Duncan salary and production is above the average player. Late in his career he took pay cuts which brought his salary down.
Lebron James vs Avg Joe
Lebron salary has kept going up. His body has not slowed down either. The amount of minutes played has not really decreased at all. Seems like a player anomally.
Dirk vs Avg Joe
Dirk took pay cuts as well. His production also started to decrease. It would be interesting to compare height and player life cycle.
A star
Vince vs Avg Joe
Vince Carters Life cycle goes up in years where it reaches its peak in his 30s then begins to decrease. Pay goes up because of new contract deals for NBA players increasing the overall salary cap.
Melo vs Avg Joe
There is no decreasein salary for Melo. However, this player is out of the league now due to the ineffciency.
Gasol vs Avg Joe
Gasol also had a decrease in salary but the new TV money seems to have set in a new reference point on how much veterans should be paid. Role Player Mike Miller vs Avg Joe
Mike Miller took a pay cut probably to play for a contender.
Crawford vs Avg Joe
Jamal Crawford got a huge payday after the new TV money after an initial decrease.
Nene Hilario vs Avg Joe
Nene salary decreased. Probably got the veterans minimum with the new 2017 CBA.
MVP Salary vs Avg Joe
## PlayerYear salary height_cm PlusMinus MinGames ppg EFG
## 37 2000 14000000 206 8.9 35.9 25.5 0.5111111
## 776 2001 19285715 216 8.7 39.5 28.7 0.5729167
## 972 2002 11250000 183 5.7 43.7 31.4 0.4226619
## 1243 2003 12072500 214 9.6 39.3 23.3 0.5145349
## 1639 2004 12676125 214 11.4 36.6 22.3 0.5029240
## 2267 2005 16000000 211 2.5 38.1 22.2 0.5030120
## 2772 2006 9625000 191 8.1 35.4 18.8 0.5783582
## 2952 2007 10500000 191 11.1 35.3 18.6 0.6132812
## 3488 2008 16360094 214 9.0 36.0 23.6 0.5087719
## 3741 2009 21262500 198 11.2 36.1 26.8 0.5023923
## 4106 2010 15779912 203 11.1 39.0 29.7 0.5447761
## 4569 2011 14500000 203 10.1 38.8 26.7 0.5425532
## 4859 2012 6993708 191 10.4 35.3 21.8 0.4719101
## 5527 2013 17545000 203 12.4 37.9 26.8 0.6067416
## 6107 2014 19067500 203 7.2 37.7 27.1 0.6107955
## 6413 2015 19997513 206 9.0 33.8 25.4 0.5780347
## 6635 2016 11370786 191 17.7 34.2 30.1 0.6311881
## 7091 2017 12112359 191 17.7 33.4 25.3 0.5765027
## 7785 2018 28299399 191 6.4 36.4 25.4 0.4786730
MVP salary of course is above the average joe salary. Winning the MVP award allows players to get the maximum contract. Other ways to obtain the max contract including making the All-NBA team consistently. This is in the CBA called the designated player exception.
The designated player exception: Applies to players that have 7 to 9 years of NBA experience. It allows players to get paid the max which is 35 percent of the current salary cap. Players before needed to be 10 years in the league before making 35 percent. (7 to 9 made 30, 0 to 6 made 25 percent of the salary cap.)
The requirements also are that the player must make two All-NBA teams or win defensive player of the year or MVP 3 years prior to signing the deal.
The Derrick Rose rule allows rookies to sign for 30 percent of the team salary cap if they were voted to the all star game twice and got on two All-NBA teams. Winning the MVP would also qualify the player to get 30 percent of the salary cap.
DPOY Salary vs Avg Joe
## PlayerYear salary height_cm PlusMinus MinGames ppg EFG Type
## 370 2000 15004800 208 5.4 34.8 21.7 0.5533333 DPOY
## 542 2001 16880000 208 5.3 23.5 13.6 0.5185185 DPOY
## 884 2002 14315790 219 5.1 36.3 11.5 0.5000000 DPOY
## 1584 2003 5200000 206 4.2 39.4 6.9 0.4833333 DPOY
## 1640 2004 5500000 206 7.2 37.7 9.5 0.4239130 DPOY
## 2279 2005 6157895 201 6.7 41.6 24.6 0.5235294 DPOY
## 2504 2006 7500000 206 11.6 35.2 7.3 0.5087719 DPOY
## 3019 2007 16000000 206 3.0 35.0 6.4 0.4545455 DPOY
## 3440 2008 11250000 211 3.7 34.9 9.1 0.4562500 DPOY
## 3646 2009 24751934 211 13.6 31.1 15.8 0.5307692 DPOY
## 4197 2010 15202590 211 11.8 34.7 18.3 0.6078431 DPOY
## 4679 2011 16647180 211 8.8 37.6 22.9 0.5895522 DPOY
## 4978 2012 18091770 211 4.5 38.3 20.6 0.5746269 DPOY
## 5385 2013 13604188 216 6.1 32.8 10.4 0.6393443 DPOY
## 5849 2014 14860523 216 0.2 33.4 14.6 0.4710744 DPOY
## 6179 2015 12700000 211 2.1 30.6 7.2 0.4375000 DPOY
## 6983 2016 16407500 201 13.8 33.1 21.2 0.5695364 DPOY
## 7278 2017 17638063 201 8.8 33.4 25.5 0.5423729 DPOY
## 7764 2018 16400000 201 6.4 32.7 11.0 0.5170455 DPOY
We can see that winning Defensive player of the year the salary is higher than the average joe. This is due to the requirements of the CBA as well as the relative market demand for these players.
ROY Salary vs Avg Joe
## PlayerYear salary height_cm PlusMinus MinGames ppg EFG
## 412 2000 2267280 201 3.2 38.1 25.7 0.4927536
## 786 2001 3629160 203 -8.0 39.3 20.1 0.4756098
## 827 2002 2494080 203 4.0 33.7 15.2 0.5118110
## 1363 2003 3193680 214 -2.2 36.0 19.0 0.5073529
## 1652 2004 1899720 208 -2.3 36.8 20.6 0.4777070
## 2172 2005 4320360 203 2.2 42.4 27.2 0.5023697
## 2413 2006 4020120 208 -4.2 33.6 13.2 0.4098361
## 3085 2007 3380160 183 -1.0 36.8 17.3 0.4705882
## 3504 2008 2883120 198 0.8 37.7 19.1 0.4873418
## 3686 2009 4484040 206 -8.6 39.0 25.3 0.5079787
## 4046 2010 5184480 191 -0.4 36.8 20.8 0.4943182
## 4742 2011 3880920 198 -4.5 37.0 17.8 0.4329268
## 5123 2012 5731080 208 7.5 36.2 20.7 0.5483871
## 5479 2013 5530080 191 -4.4 34.7 22.5 0.5027624
## 6101 2014 3202920 191 6.0 35.8 20.7 0.5062893
## 6583 2015 2300040 198 -3.6 32.6 14.6 0.4136691
## 6716 2016 5758680 203 -1.0 35.1 20.7 0.4781250
## 7178 2017 5960160 214 0.0 37.0 25.1 0.5777778
## 7861 2018 1312611 196 -2.4 29.9 13.0 0.5476190
If we look at the rookie salary scale it seems to be going up and down. There are many reasons behind this.
The first one being not always a player in the first round will win rookie of the year. Especially if that draft class is really weak. Malcolm Brogdon won rookie of the year but because he was picked in the second round 36th overall he was paid relatively low compared to a rookie players drafted in the first round.
You might be wondering why number 1 overall pick Ben Simmons had such a low salary. Ben Simmons was drafted in 2017, but because of an injury was unable to play the entire season. The new CBA was in affect for rookies for the start of the 2018 season. Which saw rookies on average get a pay increase. However, since Ben Simmons contract was signed from the original CBA his salary was relatively much lower.
Mid-level exception player vs Avg Joe
The mid-level exception allows NBA teams to sign players they normally would not be able to sign if they were over the salary cap.
Before the 2011 CBA it was set to the average NBA salary that was over the cap. After that it was set to a certain amount each year. For the 2017 CBA the apron was set to 6 million above the tax line for the 2017 and 2018 season. This number would go up and down depending on the current salary cap for the year.
What to Expect with a mid-level exception player.
## PlayerYear salary height_cm PlusMinus MinGames ppg EFG
## 1 2018 5226548 199.8919 -1.45945946 19.97838 7.862162 0.5142168
## 2 2017 3392485 199.9000 -1.43000000 18.22250 7.077500 0.5026139
## 3 2016 3282897 201.0238 -1.26666667 18.58095 6.902381 0.4972249
## 4 2015 3267768 199.6735 -0.24693878 21.51633 8.748980 0.4935115
## 5 2014 3064909 202.5106 -0.62978723 19.58936 7.225532 0.5069429
## 6 2013 3080277 200.3958 -0.99166667 19.80833 7.329167 0.4928712
## 7 2012 3033485 202.2500 -1.41923077 21.43654 7.290385 0.4944566
## 8 2011 4746809 200.3542 -1.43958333 22.56667 8.477083 0.4938972
## 9 2010 5132612 200.6364 -0.35454545 22.12182 7.790909 0.4983103
## 10 2009 4924562 201.4000 -0.03200000 22.65800 7.588000 0.4923068
## 11 2008 4890182 200.2000 -0.30000000 22.48364 7.709091 0.4937362
## 12 2007 4567807 200.0189 -1.11320755 22.99623 8.175472 0.4912060
## 13 2006 4488497 199.0476 -0.06031746 24.19841 8.488889 0.4902785
## 14 2005 4405707 203.0217 -1.96086957 23.15000 8.804348 0.4673055
## 15 2004 4485209 202.4651 0.67441860 24.23023 9.153488 0.4651711
## 16 2003 3824031 200.2083 1.52500000 22.42708 8.054167 0.4753164
## 17 2002 3696504 200.1746 -0.57777778 23.50159 8.868254 0.4737335
## 18 2001 3585560 199.4237 -1.44915254 23.21017 8.038983 0.4646160
## 19 2000 3023264 199.9796 -0.65306122 22.12245 8.469388 0.4679259
Because contract details on how teams used their mid-level exceptions through 2000-2018 wasn’t available online. I used a over and under 25 percent rule for the mid-level exception set for that current year.
After the 2012 season the NBA wanted to cut down on teams who are over the salary cap from signing good players. A good example of this is the Miami Heat who used their mid-level exception on Shane Battier during the 2012 season. Under the 2011 CBA going into the 2012 season, it only allowed teams over the luxury tax threshold to use up to 3 million dollars. Of course, the mid- level exception could be used on multiple players as well thus the decrease compared to the average salary. During the 2017 CBA this was revised and saw teams were able to spend more on the mid-level exception up to 6 million dollars.
Still it is below the average, meaning the use of the mid-level exception to possibly find a good player to fill in the team has gotten more difficult. As more and more money is being pour into the league and the increase of salary cap, more teams are able to test the market and command more than the mid- level exception.
6. What statistics are explaining salary
When looking at the correlation with player salary. The highest correlation is turnovers(0.51), PPG(0.59), started(0.48) and two(0.56). While the weakest was EFG(0.19)
We will be looking at age, total rebounds, turnover, points per game, field goals, started, threes, twos, assists, blocks, steals, effective field goal percentage and number of years played in the league.
##
## Call:
## lm(formula = salary ~ Age + t_reb + TO + ppg + FG + started +
## three + two + assist + block, data = data, subset = +steal +
## EFG + NumYears)
##
## Coefficients:
## (Intercept) Age t_reb TO ppg
## -5239935 197200 61568 705855 1655890
## FG started three two assist
## -2484691 -10019 -3508226 -1131077 7192
## block
## 1710857
##
## Call:
## lm(formula = salary ~ Age + t_reb + TO + ppg + FG + started +
## three + two + assist + block, data = data, subset = +steal +
## EFG + NumYears)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1891071 -748458 164797 459419 3079758
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5239934.9 126672.6 -41.366 < 2e-16 ***
## Age 197200.4 5763.1 34.218 < 2e-16 ***
## t_reb 61568.3 12258.2 5.023 0.000000520791913 ***
## TO 705854.6 95449.1 7.395 0.000000000000156 ***
## ppg 1655890.3 25637.4 64.589 < 2e-16 ***
## FG -2484691.0 291813.1 -8.515 < 2e-16 ***
## started -10019.5 629.3 -15.921 < 2e-16 ***
## three -3508226.0 260947.8 -13.444 < 2e-16 ***
## two -1131077.1 266445.9 -4.245 0.000022102132472 ***
## assist 7191.8 24783.8 0.290 0.772
## block 1710857.1 38982.6 43.888 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 826800 on 7967 degrees of freedom
## Multiple R-squared: 0.9117, Adjusted R-squared: 0.9116
## F-statistic: 8228 on 10 and 7967 DF, p-value: < 0.00000000000000022
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5239934.852 126672.6351 -41.3659576 0.000000e+00
## Age 197200.415 5763.0814 34.2178779 1.514667e-239
## t_reb 61568.269 12258.2061 5.0226166 5.207919e-07
## TO 705854.593 95449.1347 7.3950863 1.556817e-13
## ppg 1655890.288 25637.3569 64.5889627 0.000000e+00
## FG -2484691.033 291813.0784 -8.5146665 1.977214e-17
## started -10019.476 629.3412 -15.9205792 3.336273e-56
## three -3508225.958 260947.8496 -13.4441650 9.248692e-41
## two -1131077.115 266445.9236 -4.2450532 2.210213e-05
## assist 7191.838 24783.8135 0.2901829 7.716839e-01
## block 1710857.145 38982.5912 43.8877225 0.000000e+00
RMSE
## [1] 826180.8
The adjusted R squared was 0.9116 which is pretty good. However, we could be dealing with multicollinearity between twos, threes and field goals made. Also, with age and number of years played.
So next we should test the model with these factors removed as well as the lowest correlation EFG.
##
## Call:
## lm(formula = salary ~ t_reb + TO + ppg + FG + started + block,
## data = data, subset = +steal + NumYears)
##
## Coefficients:
## (Intercept) t_reb TO ppg FG
## -802311 562192 633838 636539 -1507803
## started block
## -9437 1222420
##
## Call:
## lm(formula = salary ~ t_reb + TO + ppg + FG + started + block,
## data = data, subset = +steal + NumYears)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2320857 -631378 -335339 718945 1550191
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -802311.0 28305.9 -28.34 <2e-16 ***
## t_reb 562191.9 9524.5 59.03 <2e-16 ***
## TO 633837.9 46678.3 13.58 <2e-16 ***
## ppg 636538.7 21541.9 29.55 <2e-16 ***
## FG -1507803.4 48172.3 -31.30 <2e-16 ***
## started -9436.8 662.5 -14.24 <2e-16 ***
## block 1222419.9 41471.4 29.48 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1031000 on 7971 degrees of freedom
## Multiple R-squared: 0.8715, Adjusted R-squared: 0.8714
## F-statistic: 9011 on 6 and 7971 DF, p-value: < 0.00000000000000022
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -802311.026 28305.8935 -28.34431 1.803136e-168
## t_reb 562191.871 9524.5346 59.02565 0.000000e+00
## TO 633837.916 46678.2791 13.57886 1.545159e-41
## ppg 636538.728 21541.9032 29.54886 3.449552e-182
## FG -1507803.444 48172.2508 -31.30025 6.070874e-203
## started -9436.754 662.4577 -14.24507 1.734605e-45
## block 1222419.853 41471.3903 29.47622 2.388163e-181
## [1] 8478009963550667
RMSE
## [1] 1030861
Adjusted R squared worsened therefore we should add back the variables and test without EFG.
##
## Call:
## lm(formula = salary ~ t_reb + TO + ppg + FG + Age + started +
## three + two + assist + block, data = data, subset = +steal +
## NumYears)
##
## Coefficients:
## (Intercept) t_reb TO ppg FG
## -4924622.4 69609.1 713760.0 1608869.0 -2058458.2
## Age started three two assist
## 185457.7 -11621.6 -3808624.1 -1421161.9 804.6
## block
## 1714923.3
##
## Call:
## lm(formula = salary ~ t_reb + TO + ppg + FG + Age + started +
## three + two + assist + block, data = data, subset = +steal +
## NumYears)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1890342 -719888 143796 410398 1786027
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4924622.4 133245.1 -36.959 < 2e-16 ***
## t_reb 69609.1 12584.1 5.532 0.00000003275268 ***
## TO 713760.0 102179.4 6.985 0.00000000000307 ***
## ppg 1608869.0 22628.6 71.099 < 2e-16 ***
## FG -2058458.2 305883.4 -6.730 0.00000000001820 ***
## Age 185457.7 6119.3 30.307 < 2e-16 ***
## started -11621.6 594.4 -19.550 < 2e-16 ***
## three -3808624.1 270022.4 -14.105 < 2e-16 ***
## two -1421161.9 278863.9 -5.096 0.00000035439981 ***
## assist 804.6 26500.8 0.030 0.976
## block 1714923.3 36829.5 46.564 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 793700 on 7967 degrees of freedom
## Multiple R-squared: 0.9239, Adjusted R-squared: 0.9238
## F-statistic: 9678 on 10 and 7967 DF, p-value: < 0.00000000000000022
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4924622.428 133245.1163 -36.95912138 3.988133e-276
## t_reb 69609.116 12584.0843 5.53152018 3.275268e-08
## TO 713759.977 102179.3823 6.98536203 3.069980e-12
## ppg 1608868.968 22628.5502 71.09907422 0.000000e+00
## FG -2058458.201 305883.3637 -6.72955265 1.819642e-11
## Age 185457.737 6119.2789 30.30712254 4.710360e-191
## started -11621.556 594.4453 -19.55025298 3.571411e-83
## three -3808624.079 270022.4458 -14.10484254 1.217956e-44
## two -1421161.879 278863.9301 -5.09625565 3.543998e-07
## assist 804.633 26500.8058 0.03036258 9.757786e-01
## block 1714923.339 36829.5154 46.56383119 0.000000e+00
## [1] 5018554660236205
RMSE
## [1] 793126.2
The adjusted R squared was better than the first model. It also had a high F stat and low p value.
So, can these particular variables determine a salary? While the model fit was good at 0.9152 with a high F stat of 9678 and a extremely low p-value showing significance. However, there has been 3 different collective bargaining agreements in the 2000-2018 NBA seasons. Each CBA brings new type of agreements between the players association and the NBA owners.
Let’s first look again at the trend of salary.
We can see there was a sudden increase during the 2015 season. We could be dealing with a Regime bias in our data. Therefore, perhaps it might be better to split up the data according to CBA.
Because there was such an increase in salary we might want to look at if it might be better to separate the data according to the CBAs. 1999 CBA, 2005 CBA, 2011 CBA and 2017 CBA.
What is the CBA(Collective Bargaining Agreement)
The collective bargaining agreement is a contractual agreement between the NBA, the NBA owners as well as the NBA players. The NBA players are represented by the president of the players union which is currently held by Chris Paul. A new CBA is signed every 6 years that includes discussions of basketball related income, player salary, pension, health care, etc… If there is no agreement between the NBA owners, NBA and players a NBA lockout will continue until the sides reach an agreement. Thus leading to a shorter season played.
Let us split the 2000 to 2018 season into 3 CBA’s. 1999-2005, 2006-2011 and 2012-2018.
1999 CBA
##
## Call:
## lm(formula = salary ~ Age + t_reb + TO + ppg + FG + started +
## three + two + assist + block, data = cba1, subset = +steal +
## NumYears)
##
## Coefficients:
## (Intercept) Age t_reb TO ppg
## -4722618 176655 63148 814801 1665340
## FG started three two assist
## -1974900 -11539 -4136394 -1645545 -18589
## block
## 1650149
##
## Call:
## lm(formula = salary ~ Age + t_reb + TO + ppg + FG + started +
## three + two + assist + block, data = cba1, subset = +steal +
## NumYears)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1852106 -615220 85809 333425 1768193
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4722618 246846 -19.132 < 2e-16 ***
## Age 176655 11456 15.420 < 2e-16 ***
## t_reb 63148 23090 2.735 0.006286 **
## TO 814801 188083 4.332 0.0000154 ***
## ppg 1665340 44007 37.843 < 2e-16 ***
## FG -1974900 559603 -3.529 0.000425 ***
## started -11539 1023 -11.277 < 2e-16 ***
## three -4136394 490758 -8.429 < 2e-16 ***
## two -1645545 505201 -3.257 0.001141 **
## assist -18589 48975 -0.380 0.704310
## block 1650149 67529 24.436 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 801600 on 2389 degrees of freedom
## Multiple R-squared: 0.9239, Adjusted R-squared: 0.9235
## F-statistic: 2898 on 10 and 2389 DF, p-value: < 0.00000000000000022
Mean 1999 CBA salary
## [1] 3629652
RMSE 1999 CBA
## [1] 438653
2005 CBA
##
## Call:
## lm(formula = salary ~ Age + t_reb + TO + ppg + FG + started +
## three + two + assist + block, data = cba2, subset = +steal +
## NumYears)
##
## Coefficients:
## (Intercept) Age t_reb TO ppg
## -1449376 73775 176073 -3466136 -1705038
## FG started three two assist
## -94004 36551 6614905 5778475 540547
## block
## 784881
##
## Call:
## lm(formula = salary ~ Age + t_reb + TO + ppg + FG + started +
## three + two + assist + block, data = cba2, subset = +steal +
## NumYears)
##
## Residuals:
## Min 1Q Median 3Q Max
## -518211 -92081 -25344 93906 1346827
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1449376.3 52711.8 -27.496 <2e-16 ***
## Age 73775.1 1833.4 40.239 <2e-16 ***
## t_reb 176073.3 12343.7 14.264 <2e-16 ***
## TO -3466135.5 58297.0 -59.456 <2e-16 ***
## ppg -1705038.4 37376.3 -45.618 <2e-16 ***
## FG -94003.7 221599.3 -0.424 0.671
## started 36550.8 947.3 38.582 <2e-16 ***
## three 6614905.4 189533.8 34.901 <2e-16 ***
## two 5778475.5 188073.0 30.725 <2e-16 ***
## assist 540546.5 17210.5 31.408 <2e-16 ***
## block 784881.1 44083.0 17.805 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 249800 on 2434 degrees of freedom
## Multiple R-squared: 0.9891, Adjusted R-squared: 0.9891
## F-statistic: 2.212e+04 on 10 and 2434 DF, p-value: < 0.00000000000000022
Mean 2005 CBA salary
## [1] 4586623
RMSE CBA 2005
## [1] 137949.2
2011 CBA
##
## Call:
## lm(formula = salary ~ Age + t_reb + TO + ppg + FG + started +
## three + two + assist + block, data = cba3, subset = +steal +
## NumYears)
##
## Coefficients:
## (Intercept) Age t_reb TO ppg
## -18569818 344673 1172011 5956972 -4766604
## FG started three two assist
## 2041865 4608 19046415 9376796 -414443
## block
## 957183
##
## Call:
## lm(formula = salary ~ Age + t_reb + TO + ppg + FG + started +
## three + two + assist + block, data = cba3, subset = +steal +
## NumYears)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4406673 -998940 -382705 2056766 4672091
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -18569818 466372 -39.818 < 2e-16 ***
## Age 344673 14620 23.575 < 2e-16 ***
## t_reb 1172011 42364 27.665 < 2e-16 ***
## TO 5956972 242854 24.529 < 2e-16 ***
## ppg -4766604 114015 -41.807 < 2e-16 ***
## FG 2041864 1495278 1.366 0.172181
## started 4608 3814 1.208 0.227112
## three 19046415 1676491 11.361 < 2e-16 ***
## two 9376796 1622094 5.781 0.00000000817 ***
## assist -414443 111963 -3.702 0.000218 ***
## block 957182 168833 5.669 0.00000001564 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2177000 on 3122 degrees of freedom
## Multiple R-squared: 0.8894, Adjusted R-squared: 0.8891
## F-statistic: 2511 on 10 and 3122 DF, p-value: < 0.00000000000000022
Mean 2011 CBA Salary
## [1] 5158950
RMSE 2011 CBA
## [1] 1362086
For the 1999-2005 season we can see that the adjusted R squared got better at 0.9235. However, the adjusted R squared was best for the 2006-2011 season with an adjusted R squared of 0.9891. However, the model got worst at 0.8891 looking at the 2012-2018 data.
While, the model still does a good job for the latest CBA the adjusted R squared could become worse as we get closer to the next CBA contract in 6 years if both mutual sides opt out or in 10 years if both sides do not opt out.
So, can we use statistics like Billy Jeanne to build a perfect team on a tight a budget? The closest thing we have to a modern Moneyball team is the Houston Rockets. Darryl Morrey the general manager is an avid user of modern player analytics to gain advantages over the opposing team. The 2018 Houston rockets had a hugely successful 2018 season winning 65 games and almost making it to the NBA finals. This of course is only one instance where “Moneyball” NBA worked for a regular season.
The one important thing we need to remember is that baseball isn’t basketball. Basketball has a soft cap while baseball has a hard cap. A soft cap allows teams over the salary cap to have different exceptions and provisions to allow teams to have more flexibility.
If we did try to Moneyball a NBA team. The starting point of course what we should look at is how well the player shoots the three-point shot. The next thing of course is to find a taller player as it would seem through the draft these players are drafted the most. From there this player will take 25-35 percent of the salary. Looking at the model of most of the top teams in the NBA in the 19 seasons you would need another player that would be signed to the max. This would take close to 70 percent of the salary cap already.
However, this isn’t a video game and there are many limitations that we have to consider that the data cannot answer. The multiple linear regression model that we developed and had an R squared of 0.9116 might be invalid under the new CBA. This is evidenced by the new terms in the current CBA as well as the ridiculous increase in salary due to increase in Basketball Related Income. With the new CBA there is only really one valid season that can be tested. Which means on average 450 observations and because there are so few observations, we might be suffering from small sample bias where we are overfitting the model. Also, with the addition of the supermax this could make past data irrelevant in trying to apply Moneyball to the NBA.
The limits of our data can’t tell us player Chemistry with the team and other players. Some players will sign on a discount if they enjoy playing with other players or a specific team. Post career income plans that the NBA provides to its players. Also, if better benefits would decrease the percentage of basketball related income. This number decreased from 57 percent to 51 percent from the 2011 CBA to 2017 CBA. Other things that might affect salary are bonus stipulations team owners award their players. For example, staying healthy for a number of games per year, doing mentoring roles for younger players, winning the assist award or points per game award.
Hence, trying to use Moneyball in the NBA looks sexy and might work on paper, but with the uncertainty of future CBA agreements playing a huge role on player salary. In reality players aren’t robots or video game characters. There is a lot of uncertainty that is unaccounted for. While cleaning this data, there were countless teams who hoped to build a great team with players who put up great statistics. Only to be locked in an unfavorable contract due to player injury or because the player could not mesh in with the team culture. Even with a stretch provision to provide cap relief, we saw players like Carmelo Anthony and Chris Bosh take over huge amounts of the team’s cap space which crippled their franchise financially.